Document Recognition System with Layout Structure Generator

نویسندگان

  • Yoshitake Tsuji
  • Hiroyuki Kami
  • Masaaki Mizuno
  • Toshiyuki Tanaka
  • Haruhiko Tanaka
  • Masao Iwashita
  • Tsutomu Temma
چکیده

A document input system, with character recognition technique, is used for converting printed matter, such as books and magazines, into code-format information. In order to improve this document input system's performance, an appropriate document structure analysis technique is indispen~able(''~'). When storing data from general printed documents into a database, it is necessary to represent the document structure. Therefore, a document layout structure generation method is especially important(*)(6). For this purpose, the authors have developed a document image structure analysis method to generate a layout structure, as well as to detect such document elements as characters, pictures and figures. This method was developed on a personal computer. Its usability is described in this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Style-Directed Document Recognition

We are developing a document recognition system that can be tunably optimized for performance on documents of specific styles. We interactively generate XML to encode specific knowledge about a class of documents to be input to a recognition system. The encoding includes attributes of document logical structure as well as layout structure constraints. The encoding of document style is used to a...

متن کامل

Knowledge-Based System for Structured Document Recognition

This paper discribes a document analysis system broadly consisting of a knowledge base, a blackboard and a set of tasks having their own set of spacialists for segmentation, recognition and for inheritance. The knowledge base contains a generic hierarchical description of the document structure in terms of layout objects labeled logically. This allows the generation of hypothetic networks of li...

متن کامل

A Pattern-Based Method for Document Structure Recognition

One of the main goals of the CIDRE1 project is the design of an interactive document recognition system able to improve with use. In previous work, members of this project have already described software architecture issues [1, 6] as well as font and logical structure recognition algorithms [8, 2]. In this paper, we present a new method for document structure recognition based on geometrical an...

متن کامل

Document Analysis and Recognition

The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early st...

متن کامل

Page Layout Classification Technique for Biomedical Documents

The structural layout information of scanned document pages is valuable for a wide range of document processing applications such as automatic document searching, document delivery and automated data entry. This paper describes the classification of scanned document pages into different classes of physical layout structures. The page layout classification technique proposed in this paper uses a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990